Skip to content

COMPAT: Use actual isinstance check for pyarrow.Array instead of hasattr(.. 'type') duck typing #52830

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jun 2, 2023

Conversation

jorisvandenbossche
Copy link
Member

@jbrockmendel
Copy link
Member

seems reasonable

@jorisvandenbossche
Copy link
Member Author

In #52833 I need a similar check, but in cython. So if we do that, we can probably only define it in cython instead of in compat/pyarrow.py

@jorisvandenbossche
Copy link
Member Author

Moved it to cython given that we will likely need pyarrow in lib.pyx anyway (eg needed for #52833 as well)

@@ -93,10 +93,21 @@
from typing import Self
else:
from typing_extensions import Self # pyright: reportUnusedImport = false

try:
from pyarrow import (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC we expect optional dependencies to be present for mypy checks, but not sure. who would know?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's also fine with me, that would just simplify this a bit.

Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this solves the problem Brock and I ran into in the other PR, this leaves the wider problem unresolved.

There are a bunch of things we might want to type check for (objects are only one part, but there are also arrow specific dtype checks). Do you suggest to create a helper function for every check that we might need at one point? This would get convoluted very quick

@jorisvandenbossche
Copy link
Member Author

Can you be more specific? Which other specific checks are you thinking about? (beyond one of the main pyarrow classes)

@mroeschke
Copy link
Member

Possibly pyarrow Table and Scalars too. There might be a use case for checking for specific Scalar or Array subclasses too?

@mroeschke mroeschke added the Arrow pyarrow functionality label Apr 28, 2023
@phofl
Copy link
Member

phofl commented Apr 28, 2023

I was referring to checks like, checking for an array/scalar and afterwards checking for a specific pyarrow type (int, ...). This would require helper functions for all checks we might want

@jbrockmendel
Copy link
Member

I was referring to checks like, checking for an array/scalar and afterwards checking for a specific pyarrow type (int, ...). This would require helper functions for all checks we might want

IIUC most of these would be relevant when we already have an ArrowExtensionArray and want to check if a scalar can be held in an array of that dtype, e.g. for setitem-with-expansion. Most of our EAs have something like this, either _validate_scalar or _validate_setitem_value. I think the way forward is to standardize this, so the relevant check would go on ArrowExtensionArray and we wouldn't need to worry about these helpers

@jbrockmendel
Copy link
Member

@phofl OK to move forward here (pending green)?

@phofl
Copy link
Member

phofl commented May 5, 2023

Yep

@jbrockmendel
Copy link
Member

@jorisvandenbossche can you merge main and ping on green

@jorisvandenbossche
Copy link
Member Author

@jbrockmendel mypy is not happy with my attempt to add a type guard:

mypy.....................................................................Failed
- hook id: mypy
- duration: 132.62s
- exit code: 1

pandas/_libs/lib.pyi:46: error: Variable "pandas._typing.ArrowArrayTypeGuard" is not valid as a type  [valid-type]
pandas/_libs/lib.pyi:46: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
Found 1 error in 1 file (checked 1417 source files)

(and I can't directly seem to fix it)

OK with leaving the typing part out for a follow-up PR? (I can directly open one with what I tried here, so we can discuss the best approach with the typing experts)

@jbrockmendel
Copy link
Member

OK with leaving the typing part out for a follow-up PR? (I can directly open one with what I tried here, so we can discuss the best approach with the typing experts)

sounds good

@jbrockmendel jbrockmendel merged commit 3f2daf5 into pandas-dev:main Jun 2, 2023
@jbrockmendel
Copy link
Member

thanks @jorisvandenbossche

@mroeschke mroeschke added this to the 2.1 milestone Jun 2, 2023
@jorisvandenbossche jorisvandenbossche deleted the pyarrow-compat branch June 5, 2023 10:09
topper-123 pushed a commit to topper-123/pandas that referenced this pull request Jun 5, 2023
…ttr(.. 'type') duck typing (pandas-dev#52830)

* Use actual isinstance check for pyarrow.Array instead of hasattr(.. 'type') duck typing

* fix import

* move to cython

* add docstring

* try TypeGuard

* remove ArrowArrayTypeGuard
Daquisu pushed a commit to Daquisu/pandas that referenced this pull request Jul 8, 2023
…ttr(.. 'type') duck typing (pandas-dev#52830)

* Use actual isinstance check for pyarrow.Array instead of hasattr(.. 'type') duck typing

* fix import

* move to cython

* add docstring

* try TypeGuard

* remove ArrowArrayTypeGuard
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants